應該是這個主題最後關於技術內容的文章,說今天的內容其實我一早已經想好,這是參考一下外國YouTuber所改良的程式,我新增的大概就事輸入資料,那麼便開始。(本來我想連接到discord,但有點複雜,不足以在一天內講完所以便沒有加入了。)
聊天機器人可以與使用者對話。它利用自然語言處理 (NLP) 技術和演算法來理解和產生類似人類的回應。
(這裏我沒有使用tensorflow的模型或其他演算法,只是一些基本的加減乘除。)
data2.json
{
"all_data": [
{
"tag": "Hello",
"responses": [
"Hello!"
],
"patterns": [
"hello",
"hi",
"hey",
"sup",
"heyo"
],
"keyword": []
},
{
"tag": "bye",
"responses": [
"Hope to see you again"
],
"patterns": [
"bye",
"goodbye"
],
"keyword": []
},
{
"tag": "greeting",
"responses": [
"I am doing fine, and you?"
],
"patterns": [
"bye",
"goodbye"
],
"keyword": []
},
{
"tag": "thank",
"responses": [
"You are welcome"
],
"patterns": [
"thanks",
"thank",
"helpful"
],
"keyword": []
},
{
"tag": "eat",
"responses": [
"I don't like eating anything because I'm a bot obviously!"
],
"patterns": [
"what",
"you",
"eat"
],
"keyword": [
"you",
"eat"
]
},
{
"tag": "advice",
"responses": [
"If I were you, I would go to the internet and type exactly what you wrote there!"
],
"patterns": [
"give",
"advice"
],
"keyword": [
"advice"
]
},
{
"tag": [
"test"
],
"responses": [
"200"
],
"patterns": [
"test"
],
"keyword": []
},
{
"tag": [
"Deep_Learning"
],
"responses": [
"relate"
],
"patterns": [
"deep",
"learn",
"relat",
"ai"
],
"keyword": []
},
{
"tag": [
"My_name"
],
"responses": [
"My name is z2"
],
"patterns": [
"name"
],
"keyword": [
"Your"
]
}
]
}
這對的資料庫看起來很長,但其實只有幾項,之後再進行訓練時,會不斷增加資料。
去分割資料庫只是為了方便閱讀,responses
就是對話回答的答案,patterns
就是從對話中搜尋是否有以下的字,keyword
就是對話中必須含有那個字詞才會作出相對的回應,而tag
只是方便你去閱讀大概這個裏邊是什麼,今天的專案不會用到。
如果不清楚json的小夥子可以去看看我之前的文章
(這次用到的程式碼有點長)
import re
import json
import random
import nltk
import string
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
def stop_words_and_tokenize(text:str):
words=word_tokenize(text)
stop_words = set(stopwords.words('english'))
punctuation = set(string.punctuation)
filtered_words = [word.lower() for word in words if word.lower() not in stop_words and word.lower() not in punctuation]
ps = PorterStemmer()
stem_word= map(ps.stem,filtered_words)
return list(stem_word)
def load_data(file_path:str):
data= open(file_path,"r")
data: dict = json.load(data)
data=data["all_data"]
return data
def unknown():
response = ["Could you please re-phrase that? ",
"Sounds about right.",
"What does that mean?"][
random.randrange(3)]
newknowledge= input("What does it mean? can you teach me: ")
respon =input("How can I responses : ")
new_words=stop_words_and_tokenize(prompt_text)
item={"tag":[newknowledge],"responses": [respon],"patterns":new_words,"keyword":[]}
config = json.loads(open('data2.json').read())
config["all_data"].append(item)
with open('data2.json','w') as f:
f.write(json.dumps(config,indent=2))
return("thank you")
def message_probability(user_message, recognised_words, single_response=False, required_words=[]):
message_certainty = 0
has_required_words = True
#計算每條預定義訊息中存在的單字數
for word in user_message:
if word in recognised_words:
message_certainty += 1
# 計算用戶訊息中已識別單字的百分比
percentage = float(message_certainty) / float(len(recognised_words))
#檢查字串中是否包含單字
for word in required_words:
if word not in user_message:
has_required_words = False
break
if has_required_words or single_response:
return int(percentage * 100)
else:
return 0
def check_all_messages(message):
highest_prob_list = {}
#過濾後將其添加到字典中
def response(bot_response, list_of_words, single_response=False, required_words=[]):
nonlocal highest_prob_list
highest_prob_list[bot_response] = message_probability(message, list_of_words, single_response, required_words)
n=load_data("data2.json")
for i in n:
response(i["responses"][0], i["patterns"], required_words= i["keyword"])
best_match = max(highest_prob_list, key=highest_prob_list.get)
print(highest_prob_list)
print(f'Best match = {best_match} | Score: {highest_prob_list[best_match]}')
return unknown() if highest_prob_list[best_match] < 1 else best_match
#用於獲取回應
def get_response(user_input):
split_message = re.split(r'\s+|[,;?!.-]\s*', user_input.lower())
response = check_all_messages(split_message)
print(split_message)
return response
#進行訓練
while True:
prompt_text=input("You:")
print(str('Bot: ' + get_response(prompt_text)))
你們可以自行嘗試把資料庫完有的內容輸入或輸入其他去試試他會不會記錄,由於只是普通的對話,這裏不放圖片了
導入所需的資料庫
stop_words_and_tokenize
對輸入的對話進行簡化
stopwords.words('english')
一些句子沒有太重要意思的字,進行過濾
ps.stem
把全部的單字轉為一個基本形識,以方便待會的分類
load_data
先設定一個函數,載入所有資料,沒有太複雜的地方,可以自行理解
unknown
遇到不清楚的輸入時,作出有禮貌的回應,並會把它載入到資料庫,要求輸入者作出解釋及如何回答,在下次重啟程式碼時便會有那個輸入的知識,能不斷提升這個資料庫
message_probability
就一個非常簡單的演算法,只是透過句子中有沒有那個詞語去確定,有的話便增加他的分數,計算最高分數。
check_all_messages
輸出最佳分數的項目
highest_prob_list
迴傳一個列表,有各種的問題的相似機率,選擇最高機率的作回應
get_response
把輸入的內容去除標點符號,並把全部轉為小階的英文字母
re.split
先去除標點符號,之後把每個單字分開,以作之後的過濾
while True
呼叫函數,令整個聊天功能運行
說真的,今天內容我整理會很久,我也認為十分有趣及容易入門,如果覺得我的文章對你有幫助或有更好的建議,可以追蹤我和不妨在留言區提出,我們明天再見。
refernece:
https://www.youtube.com/watch?v=CkkjXTER2KE